NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Toward In-Context Teaching: Adapting Examples to Students’ Misconceptions

Ross, Alexis; Andreas, Jacob (August 2024, Proceedings of the Annual Meeting of the Association for Computational Linguistics)
Learning Phonotactics from Linguistic Informants

Breiss, Canaan; Ross, Alexis; Maina-Kilaas, Amani; Levy, Roger; Andreas, Jacob (June 2024, Society for Computation in Linguistics)

We propose an interactive approach to language learning that utilizes linguistic acceptability judgments from an informant (a competent lan- guage user) to learn a grammar. Given a gram- mar formalism and a framework for synthesiz- ing data, our model iteratively selects or synthe- sizes a data-point according to one of a range of information-theoretic policies, asks the in- formant for a binary judgment, and updates its own parameters in preparation for the next query. We demonstrate the effectiveness of our model in the domain of phonotactics, the rules governing what kinds of sound-sequences are acceptable in a language, and carry out two experiments, one with typologically-natural linguistic data and another with a range of procedurally-generated languages. We find that the information-theoretic policies that our model uses to select items to query the infor- mant achieve sample efficiency comparable to, and sometimes greater than, fully supervised approaches.
more » « less
Full Text Available
Learning Models for Actionable Recourse

Ross, Alexis; Lakkaraju, Himabindu; Bastani, Osbert (January 2021, Advances in neural information processing systems)

Full Text Available
Competency Problems: On Finding and Removing Artifacts in Language Data

https://doi.org/10.18653/v1/2021.emnlp-main.135

Gardner, Matt; Merrill, William; Dodge, Jesse; Peters, Matthew; Ross, Alexis; Singh, Sameer; Smith, Noah A. (January 2021, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing)

Much recent work in NLP has documented dataset artifacts, bias, and spurious correlations between input features and output labels. However, how to tell which features have “spurious” instead of legitimate correlations is typically left unspecified. In this work we argue that for complex language understanding tasks, all simple feature correlations are spurious, and we formalize this notion into a class of problems which we call competency problems. For example, the word “amazing” on its own should not give information about a sentiment label independent of the context in which it appears, which could include negation, metaphor, sarcasm, etc. We theoretically analyze the difficulty of creating data for competency problems when human bias is taken into account, showing that realistic datasets will increasingly deviate from competency problems as dataset size increases. This analysis gives us a simple statistical test for dataset artifacts, which we use to show more subtle biases than were described in prior work, including demonstrating that models are inappropriately affected by these less extreme biases. Our theoretical treatment of this problem also allows us to analyze proposed solutions, such as making local edits to dataset instances, and to give recommendations for future data collection and model design efforts that target competency problems.
more » « less
Full Text Available

Search for: All records